42 research outputs found

    Reduced Memory Region Based Deep Convolutional Neural Network Detection

    Get PDF
    Accurate pedestrian detection has a primary role in automotive safety: for example, by issuing warnings to the driver or acting actively on car's brakes, it helps decreasing the probability of injuries and human fatalities. In order to achieve very high accuracy, recent pedestrian detectors have been based on Convolutional Neural Networks (CNN). Unfortunately, such approaches require vast amounts of computational power and memory, preventing efficient implementations on embedded systems. This work proposes a CNN-based detector, adapting a general-purpose convolutional network to the task at hand. By thoroughly analyzing and optimizing each step of the detection pipeline, we develop an architecture that outperforms methods based on traditional image features and achieves an accuracy close to the state-of-the-art while having low computational complexity. Furthermore, the model is compressed in order to fit the tight constrains of low power devices with a limited amount of embedded memory available. This paper makes two main contributions: (1) it proves that a region based deep neural network can be finely tuned to achieve adequate accuracy for pedestrian detection (2) it achieves a very low memory usage without reducing detection accuracy on the Caltech Pedestrian dataset.Comment: IEEE 2016 ICCE-Berli

    Designing bots in games with a purpose

    Get PDF

    Coding local and global binary visual features extracted from video sequences

    Get PDF
    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks, while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW) model. Several applications, including for example visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget, while attaining a target level of efficiency. In this paper we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the Compress-Then-Analyze (CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: homography estimation and content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin

    Multi-view coding of local features in visual sensor networks

    Get PDF
    Local visual features extracted from multiple camera views are employed nowadays in several application scenarios, such as object recognition, disparity matching, image stitching and many others. In several cases, local features need to be transmitted or stored on resource-limited devices, thus calling for efficient coding techniques. While recent works have addressed the problem of efficiently compressing local features extracted from still images or video sequences, in this paper we propose and evaluate an architecture for coding features extracted from multiple, overlapping views. The proposed Multi-View Feature Coding architecture can be applied to either real-valued or binary features, and allows to obtain bitrate reductions in the order of 10-20% with respect to simulcast coding

    A Visual Sensor Network for Parking Lot Occupancy Detection in Smart Cities

    Get PDF
    Technology is quickly revolutionizing our everyday lives, helping us to perform complex tasks. The Internet of Things (IoT) paradigm is getting more and more popular and is key to the development of Smart Cities. Among all the applications of IoT in the context of Smart Cities, real-time parking lot occupancy detection recently gained a lot of attention. Solutions based on computer vision yield good performance in terms of accuracy and are deployable on top of visual sensor networks. Since the problem of detecting vacant parking lots is usually distributed over multiple cameras, adhoc algorithms for content acquisition and transmission are to be devised. A traditional paradigm consists in acquiring and encoding images or videos and transmitting them to a central controller, which is responsible for analyzing such content. A novel paradigm, which moves part of the analysis to sensing devices, is quickly becoming popular. We propose a system for distributed parking lot occupancy detection based on the latter paradigm, showing that onboard analysis and transmission of simple features yield better performance with respect to the traditional paradigm in terms of the overall rate-energy-accuracy performance

    Fast keypoint detection in video sequences

    Get PDF
    corecore